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Abstract. In this paper, we propose a computational approach to model the 
Zone of Proximal Development (ZPD) using predicted probabilities of correct- 
ness and engaging students in reflective dialogue. To that end, we employ a 
predictive model that uses a linear function of a variety of parameters, including 
difficulty and student knowledge and we analyze the activity of students who 
use a natural language tutoring system that presents conceptual reflection ques- 
tions after students solve high-school physics problems. In order to operational- 
ize our approach, we introduce the concept of the “Grey Area”, that is the area 
of uncertainty in which the student model cannot predict with acceptable accu- 
racy whether a student is able to give a correct answer without support. We fur- 
ther discuss the impact of our approach on student modeling, the limitations of 
this work and we discuss future work in systematically and rigorously evaluat- 
ing the approach. 


Keywords: Natural-language tutoring systems, intelligent tutoring systems, 
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1 INTRODUCTION 


Intelligent Tutoring Systems (ITSs) support students in grasping concepts, applying 
them during problem solving activities, addressing misconceptions and in general 
improving students’ proficiency in science, math, reading and other areas [1]. How- 
ever, we still face the challenge of developing tutoring systems that emulate the inter- 
active nature of human tutoring and that are just as effective — if not better — than 
human tutors. One approach to this is to engage students in reflective discussions 
about scientific concepts [2]. To a large extent, these systems lack the ability to gauge 
students’ level of mastery over the curriculum that the tutoring system was designed 
to support. This is also challenging for human tutors, who gauge the level of 
knowledge and understanding of their tutees to some degree, although they are poor at 
diagnosing the causes of student errors [3]. We argue that a student model that main- 
tains and dynamically updates a representation of students’ ability level on targeted 
curriculum elements can help us address the differences between human and simulat- 


ed tutors. Tutorial dialogue systems could be more effective if they are guided by the 
information about the student’s understanding of curriculum elements that is repre- 
sented within a student model, along with other student characteristics such as demo- 
graphic information and motivational traits such as interest in the targeted domain, 
self-efficacy, etc. [4]. 


1.1. Research Hypothesis and Impact 


In order to provide meaningful instruction and scaffolding to students, a tutoring sys- 
tem should appropriately adapt the learning material with respect to both content and 
presentation. A way to achieve this is to dynamically assess the students’ knowledge 
state and needs. Human tutors use their assessment of student ability to adapt the level 
of discussion to the student’s “zone of proximal development” (ZPD) [5]. Adapting 
the conversation to the ZPD would mean asking the student questions just beyond 
their knowledge level, in other words, asking questions that students are able to an- 
swer correctly with adequate support. During tutorial dialogues, human teachers eval- 
uate their students’ learning state; a teacher judges whether a student will be able to 
answer a question correctly without any help (that is, the student is above the ZPD) or 
be able to answer correctly if given some help (that is, the student is in the ZPD) or 
not be able to answer correctly despite any help (that is, the student is below the 
ZPD). Depending on this judgment, the teacher will either choose to ask this question 
or to provide additional hints or rather choose a more appropriate question for this 
student’s level. Following the practice of human tutors, we propose a computational 
approach to model the ZPD of students who carry out learning activities using a dia- 
logue-based intelligent tutoring system. We use the predictions of the student model 
to substitute for human judgment. In particular, we employ a student model to assess 
students’ changing knowledge as they engage in a dialogue with the system. In each 
step of the dialogue, the student model predicts the probability of the student being 
able to answer correctly the question posed by the computer tutor. When the predicted 
probability is high, the student is likely to possess the knowledge needed to answer 
the question correctly. When the predicted probability is low, it is unlikely that the 
student has an adequate grasp of the necessary knowledge to give a correct response. 
An interesting case arises when the student model predicts that the student will be 
able to answer a question with a probability around 50%, because in this case there is 
greater uncertainty as to whether the student will answer the question correctly or 
incorrectly. In other words, the student may need some extra support to be able to 
give a correct response. Hence the region of predicted probabilities that reflects this 
area of uncertainly with regards to the student’s abilities to give a correct answer 
without support is what we call the “Grey Area” [6]. 

Our research hypothesis is that we can use the fitted probabilities as predicted by the 
student model to model the ZPD. The core rationale is that if the student model cannot 
predict with acceptable accuracy whether a student will answer a question correctly, 
then it might be the case that the student is in the ZPD. To the best of our knowledge, 
this is a novel approach to modeling the ZPD, never published in the literature previ- 
ously. 


In the following chapter we discuss related research regarding the ZPD, Intelligent 
Tutoring Systems and student modeling and in section 3, we present our approach and 
methodology. The analysis and results are presented in section 4. In section 5 we 
discuss the impact and the implications of our approach and we conclude by present- 
ing the limitations of our study and future work. 


2 RELATED WORK 


2.1 Zone of Proximal Development 


The Zone of Proximal Development (ZPD) is one of the best known concepts in edu- 
cational psychology, defined by Vygotsky as: “the distance between the actual devel- 
opmental level as determined by independent problem solving and the level of poten- 
tial development as determined through problem solving under adult guidance or in 
collaboration with more capable peers” [5]. This definition of the ZPD points out the 
importance of appropriate assistance in relation to the learning and development pro- 
cess and thus it can be stated more simply as “the difference between what a learner 
can do without help and what he or she can do with help” [7]. Deriving ways to iden- 
tify and formally describe the ZPD is an important step towards understanding the 
mechanisms that drive learning and development, gaining insights about learners’ 
needs, and providing appropriate pedagogical interventions [8]. Approaches to identi- 
fying or modeling the ZPD typically depend on finding instances of successful assist- 
ed performance; for example, tasks that a student carries out successfully after having 
received some kind of scaffolding [8]. Various methods that derive from or build 
upon this notion have been developed for the dynamic assessment (DA) of the learn- 
ing potential of students (or learners in general) [9]. Usually these approaches employ 
tests that measure the difference between unmediated and mediated performance [10] 
or the cognitive modifiability of learners (i.e., how students’ cognitive structures 
change when they fail a task and the teacher/expert gives them help or remediation 
tasks) [11]. However, Dynamic Assessment focuses on assessing the learning or de- 
velopment potential of the learner rather than the actual level of development. Luckin 
and du Boulay proposed the use of domain knowledge representations and Bayesian 
Belief Networks (BBN) in order to model the ZPD of individual students [12]. Each 
student’s knowledge is represented as an overlay model and the student model is 
compared to the domain knowledge representation. In order to choose a learning ac- 
tivity or task, the probability of this task being in the student’s ZPD is computed by 
the BBN with respect to the tasks the student has already completed and the student’s 
performance. 


2.2 Intelligent Tutoring Systems and Student Modeling 


Intelligent Tutoring Systems (ITSs) commonly use student models to track the per- 
formance of students and choose appropriate content for practicing skills and foster- 
ing knowledge. Most student models developed for ITSs are based on the notion of 
mastery learning; that is, the student is asked to continue solving problems or answer- 


ing questions about a concept until she has mastered it. Only then will the student be 
guided to move forward to other concepts [13]. Mastery learning is in line with the 
notion of learning curves that is, how many opportunities a student needs in order to 
master a skill. One could argue that mastery learning is consistent with the ZPD, in 
the sense that the student is considered to have mastered a skill when she is able to 
successfully carry out a task that requires this particular skill without help. However, 
the ZPD does not directly address mastery but rather potential “development” under 
appropriate assistance; by identifying the ZPD not only can we assess the state of a 
student’s knowledge but we also gain insight into how appropriate instruction can 
scaffold development [14]. Human tutors do not carry out detailed diagnoses of stu- 
dent knowledge and their assessments of students’ knowledge are often inaccurate 
[3]. Nonetheless, they construct and dynamically update a normative mental represen- 
tation of students’ grasp of the content under discussion, as reflected in tutors’ adap- 
tive responses to students’ need for scaffolding or remediation [15]. Similarly, a tuto- 
rial dialogue system uses a student model to adapt to the student’s needs. Otherwise, 
all students would be presented with the same topics, at the same level of detail or 
complexity. Moreover, if the student answers a question incorrectly and there is need 
for remediation, the simulated tutor will not be able to adapt the type of support that it 
provides. Indeed, it is the absence of information about the student that forces design- 
ers of tutorial dialogue systems to make a “best guess” about how to structure a dia- 
logue—that is, what the main “line of reasoning” should be, what remedial or sup- 
plemental subdialogues to issue and when—and then to hard code these guesses into 
the dialogues. Consequently, with the “one size fits all” approach to dialogue that is 
implemented in most tutorial dialogue systems, students are often under-exposed to 
material that they don’t understand and overexposed to material they have a firm 
grasp of. The first problem renders these systems ineffective in enabling students to 
achieve mastery over the focal content; the second makes them inefficient. 


3 METHODS 


3.1. Rimac: A Dialogue Tutor for Physics 


In this study we explored the proposed approach using Rimac, a web-based natural- 
language tutoring system that engages students in reflective discussions about con- 
cepts after they solve quantitative physics problems [16]. Rimac has been used suc- 
cessfully to teach physics concepts to high-school students [17]. We used data col- 
lected during three previous studies with the Rimac system to train a student model 
and predict students’ performance. The three studies were conducted within high 
school physics classes at schools in the Pittsburgh, PA area, following a similar proto- 
col. First, students took a pretest and were introduced to Rimac. Then they interacted 
with Rimac to discuss the physics knowledge associated with quantitative problems 
on dynamics. Finally, students took a post-test to measure learning gains. The tests 
aimed to test students’ conceptual understanding of physics instead of their ability to 
solve quantitative problems. Rimac’s dialogues were developed to present a directed 
line of reasoning, or DLR [18], in which the tutor presents a series of questions to the 


student. If the student answers a question correctly, she advances to the next question 
in the DLR. If the student responds incorrectly, the system launches a remedial sub- 
dialogue and then returns to the main line of reasoning after the sub-dialogue has 
completed. If the system is unable to understand the student’s response, it completes 
the step for the student (for more details, see [19]). The knowledge components relat- 
ed to tutor question/student response pairings are logged during the system’s interac- 
tions with students and were used to train the student model as described next. A short 
example of a dialogue with Rimac is presented in Table 1. 


Table 1. A short example of an adaptive dialogue with Rimac 


Tutor: So, can you please tell me what the vertical forces on the arrow are? 
Student: gravity 


Tutor: Very good. Since we know that the force of gravity is acting on the arrow, what 
does that mean about the arrow s vertical) acceleration (zero, nonzero, etc)? 
Student: nonzero 


3.2 The Student Model 


For this study, we used an Additive Factor Model (AFM) to model students’ 
knowledge which was introduced into ITS research by Cen et. al. [20]. The model 
uses logistic regression to predict the probability of a student i completing a step j 
correctly as a linear function of student parameters (the student’s proficiency 6,), skill 
parameters /, and the learning rates of skills y,, as shown in equation (1). AFM takes 
into account the frequency of prior practice and exposure to skills but not the correct- 
ness of responses since it assumes all students accumulate knowledge in the same 
way. In this paper we implemented the AFM model following the approach of Chi et 
al. [21] who modelled students working on physics problems using a dialogue-based 
tutor. 

Pi j 
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The dataset was collected by training 291 students on Rimac over a period of 4 
years (2011-2015). During students’ interactions with Rimac, they answered reflec- 
tion questions on physics problems about dynamics, such as: “Jn our first question we 
will focus on the horizontal motion of the arrow. Let's imagine a scenario in which an 
archer is standing at the edge of a high cliff: He shoots an arrow perfectly horizontal- 
ly with an initial velocity of 60 m/s off this cliff: During the arrow’s flight, how does 
its horizontal velocity change (increases, decreases, remains the same, etc.)? Remem- 
ber that you can ignore air resistance”. Students worked on reflection questions about 
three physics problems that explored motion laws and addressed 88 knowledge com- 
ponents (KCs). The dataset contained in total 15,644 student responses. Each student 
response answers a question posed by the tutor and was classified as correct or incor- 
rect. For the training of the model we split our dataset following an 80% rule: 12,515 
student responses were used for training the model and the remaining 3,129 were used 
for testing. On average, each student answered a total of 53 questions, stemming from 
several reflection questions. The test set contained on average 11 entries per student 


(i.e., 20% of the total number of student responses; the rest were used for training the 
model). We chose this training approach because we wanted to study how the same 
students represented in the training data would perform on future and sometimes un- 
seen steps, and how their knowledge level changes during the dialogue. 


3.3. The Grey Area and the Study Setup 


To predict correctness of students’ responses, we used the AFM student model (de- 
scribed in 3.2). Then, we classified the outcome as correct or incorrect based on the 
fitted probabilities provided by the model. In this study, the student model provided 
predictions at the step level (one step is one question/answer of the tutorial dialogue). 
A step might involve one or multiple KCs. The classification threshold in this case 
(i.e., the cutoff determining whether a response is classified as correct or incorrect) is 
0.5 and was validated using the ROC curve for the binary classifier (Fig. 1). For ex- 
ample, if the fitted probability for a step in the dialogue is 0.8 (above 0.5) then we 
expect that the student will be able to answer the corresponding dialogue step correct- 
ly; hence, it is classified as correct. Similarly, if the fitted probability for a step in the 
dialogue is 0.2 (below 0.5) then we expect that the student will not be able to answer 


correctly; hence it is classified as incorrect. 
ROC Curve for Student Model's Predictions 
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Fig. 1. ROC Curve for validating the classification threshold 


We show that the predicted probabilities correlate with students’ performance in 
the pre and post-tests: the student model will provide high probabilities of correctness 
(i.e., a high probability to answer a question correctly) for students who performed 
well on the pre and post-tests. Similarly, the model will provide low probabilities of 
correctness for students who performed poorly in the pre and post-tests. By showing 
that there is a correlation between students’ performance in the pre and post-tests and 
predicted probabilities, we argue that predicted probabilities are appropriate indicators 
of the ZPD. In this study the pre and post-tests assessed conceptual knowledge asso- 
ciated with the questions that students were assigned to work on. Furthermore, we 
expect that the closer the prediction is to the classification threshold, the higher the 
uncertainty of the model and thus, the higher the prediction error. In other words, 
when the student model predicts that the student will be able to answer a question 
with a probability close to 0.5, we are more uncertain than with any other prediction 


as to whether or not the student will answer the question correctly. Based on our hy- 
pothesis, the window where the prediction error is high (i.e. the “Grey Area”) can be 
used to approximately model the student’s zone of proximal development. The con- 
cept of the Grey Area is depicted in Fig. 2. 

The space “Above the Grey Area” denotes the region where the student is predict- 
ed to answer correctly (the fitted probability is considerably higher than the cutoff 
threshold) and consequently may indicate the area above the ZPD; that is, the area in 
which the student is able to answer a question without any assistance. Accordingly, 
the space “Below the Grey Area” denotes the area where the student is predicted to 
answer incorrectly (the fitted probability is considerably lower than the cutoff thresh- 
old) and consequently may indicate the area below the ZPD; that is, the area in which 
the student is not able to carry out the task either with or without assistance. In this 
paper, we model the Grey Area symmetrically around the classification threshold for 
simplicity and because the binary classifier was set to 0.5. However, the symmetry of 
the Grey Area is something that could change depending on the classification thresh- 
old and the learning objectives. This is also the case for the size of the Grey Area. 
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Fig. 2. The Grey Area concept with respect to the fitted probabilities as predicted by the student 
model for a random student and for the various steps of a learning activity. Here we depict the 
Grey Area ranging from 0.4 to 0.6 and extending on both sides of the classification threshold 
(dotted line). 


In this paper we present the concept of the Grey Area and the methodology to 
model the ZPD. We are exploring, but have not yet specified several design aspects 
(e.g. thresholds, the use of symmetrical of asymmetrical Grey Areas etc.). We do not 
propose a specific size but rather try out Grey Areas of different sizes and study how 
the student model behaves within these areas. We believe that the decision about the 
appropriate size (or shape) of the Grey Area is not only a modeling issue but also, and 
perhaps predominantly, a pedagogical one since it relies on the importance of the 
concepts taught, the teaching strategy and the learning objectives. That is, a teacher 
may consider that it is very important to elicit an answer even if it is predicted that the 
student is not knowledgeable about the topic or a topic might be of minor importance 
and thus even a low probability of correctness would be considered sufficient to clas- 
sify the student as knowledgeable. 


4 ANALYSIS AND RESULTS 


4.1 Model Behavior and Student Performance 


Our research hypothesis is based on the rationale that the predicted probabilities of 
the student model can provide insight into student knowledge and performance. That 
is, the fitted probabilities for a high-performing student will be higher than the fitted 
probabilities for a low-performing student. One could argue that predicted probabili- 
ties are a model’s characteristic and may not be appropriate to describe students’ per- 
formance. However, the fitted probability represents the probability that a student will 
correctly answer a dialogue question. Since high performers have a higher probability 
of correctly answering questions, the average of their fitted probabilities will be high- 
er than those of low performing students. 

We performed a correlation analysis to explore this hypothesis. We correlated the 
average fitted probability (i.e., the average value of the fitted probabilities for the 
answers of each student) per user with the students’ knowledge pre-test scores. The 
correlation analysis showed that the average fitted probability correlates positively 
with the pre-test scores at a statistically significant level (Pearson’s r = 0.396**, 
p<0.01). The positive correlation was also confirmed for the post-test scores (Pear- 
son’s r = 0.46**, p<0.01). This suggests that if a student scores high on the pre-test 
for a particular KC, the model will predict that this student is able to answer a ques- 
tion that deals with this KC. Similarly, a student who was predicted to answer correct- 
ly a question dealing with a KC will also have a high post-test score for this KC. This 
finding indicates that the model can predict a student’s performance and may be fur- 
ther used to model the student’s zone of proximal development. One might notice that 
the correlation between the average fitted probabilities and the pre and _ post- 
knowledge tests are not high (Pearson’s r < 0.5). However, this might be due to the 
fact that in the pre and post knowledge tests we only test a small part of the 
knowledge components that are present in the dialogues. Therefore, the pre and post- 
knowledge scores can be suggestive of the student’s knowledge state but they do not 
accurately represent it. Model Accuracy for cases inside the Grey Area 

The Grey Area is defined as the area where the model cannot accurately predict 
whether a student will correctly answer a particular question. To operationalize the 
grey area with respect to size and threshold, we define areas of different sizes and 
further explore the model’s behavior within these areas. For this study, we considered 
five grey areas of different sizes: Area | (0.45 < p <0.55), Area 2 (0.4 < p <0.6), Area 
3 (0.35 < p <0.65), Area 4 (0.3 < p <0.7) and Area 5 (0.25 < p <0.75). We chose these 
particular areas so as to cover the range around the classification threshold for which 
one would expect low predictive accuracy. For these areas, we calculated how many 
times the model predicted the student answer accurately, where accuracy is defined as 
the total number of times (a) the student answered correctly and the model also pre- 
dicted the student would answer correctly and (b) the student answered incorrectly 
and the model also predicted the student would answer incorrectly divided by the total 
number of predictions. Table 2 presents the non-cumulative and the cumulative anal- 
ysis of the data. For non-cumulative analysis, we mean the analysis of the cases that 


are contained only in the focal grey area under study (non-cumulative results) and 
exclude the cases that are also contained in preceding areas. For example, in Area 2 
we examine 420 cases that are not contained in Area 1. The cumulative analysis pre- 
sents the analysis of cases that are contained in the current area but can also be part of 
the preceding grey area (cumulative results). For example, Area 2 analyzes 814 cases 
out of which 394 are also contained in Area 1. The results of the non-cumulative 
analysis show that most predicted cases fall in Area 2 —- Non Cumulative (the largest 
increase in uncertain cases is with Area 2) and that 42.6% of them are predicted incor- 
rectly. This means that for 13.4% (420 cases) of the total number of cases (Total 
Number of Cases: 3,129), the model gave a prediction with a probability from 0.4 to 
.45 and .55 to 0.6. As we move away from the classification threshold (0.5), the num- 
ber of additional fitted cases tends to decrease (fewer cases are predicted with proba- 
bilities far from the cutoff threshold) but the percentage of the correct predictions 
improves. This is depicted in Fig. 3. That finding was expected since the confidence 
of the model increases. 


Table 2. Predictions’ accuracy within grey areas of different sizes. In non-cumulative analysis 
(NC) successive areas do not contain cases that were present in preceding areas (e.g., statistics 
for Area 2 do not take into account the cases contained in Area 1). In cumulative analysis (C) 
successive areas contain cases that were also present in preceding areas (e.g., statistics for Area 
2 also take into account the cases contained in Area 1) 


NC/(C) Area 1 Area 2 Area 3 Area 4 Area 5 

#Cases - NC (C) 394 (394)  420/(814) 404/(1218) 369 /(1587) 304 (1891) 
Cases (%)- NC (C) 12.6/(12.6) 13.4(26.0) 12.9(38.9) 11.8/(50.7) 9.7(60.4) 
#Correct- NC (C) 213(213) 241. (454) 259.713) 250963) 221. (1184) 


income NOG) 181 (181) 179 (360) 145505) 119 (624) —- 83.707) 
Correct (%)-NC(C) 54-1 (54.1) 57.4(55.8) 64.1 (58.5) 67.81(60.7) 72.7 (62.6) 


Incorrect (%)- NCC) 45.9 (45.9) 42.6 (44.2) 35.9 (41.5) _32.3(39.3) 27.3. 137.4) 


Model Behavior for Grey Areas of Different Size 


(non-cummulative results) 
m%Cases mipredictedcorrectly(%) m= #predicted incorrectly(%) 


727 
67.8 
64.1 
A 
54.1 3 
45, 
22 42.6 
35.9 
32.2 
273 
13.4 : 
12.6 12.9 118 L oy 1 


Area 1 Area 2 Area 3 Area 4 AreaS 


Fig. 3. Model behavior (percentage of total number of predicted cases, cases predicted correctly 
and cases predicted incorrectly) within the five grey areas of different sizes. The areas are or- 
dered from the narrowest (Area 1) to the widest (Area 5). Each area contains cases that are not 
contained in preceding areas (non-cumulative analysis). 
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For Area 1, the prediction error is higher (45.9% of the cases were not predicted 
correctly) but the number of fitted cases is lower than Area 2. In Fig. 4 we depict the 
results for the cumulative analysis. As expected, more cases are predicted correctly as 
the area size increases. On one hand, choosing a narrow grey area to model the ZPD 
would limit the number of cases we scaffold since fewer cases would fall within the 
area. On the other hand, choosing a wide grey area would affect the accuracy; that is, 
some cases that could be predicted correctly would be falsely labeled as “grey”. Our 
work to date does not aim to define the appropriate size for the Grey Area but rather 
to study how the model’s behavior changes for areas, of different size. It is worth 
mentioning that for the area that is not included in the five areas we study—that is, the 
area [0,0.25)U(0.75, 1]—the model predicted 89% of the cases correctly while the 
overall accuracy of the model was 73%. 


Model Behavior for Grey Areas of Different Size 
(cummulative results) 
m%Cases mipredictedcorrectly(%) m= #predicted incorrectly(%) 
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Fig. 4. Model behavior (percentage of total number of predicted cases, cases predicted correctly 
and cases predicted incorrectly) within the five grey areas of different sizes. The areas are or- 
dered from the most narrow (Area 1) to the widest (Area 5). Each area contains cases that are 
also contained in preceding areas (cumulative analysis). 


4.2 Grey Areas and Students’ Performance 


So far in this paper, we have studied how the model performs within grey areas of 
various sizes, but we have no indication of students’ performance. One could argue 
that based on the way the grey zone was modeled—that is, symmetrical around the 
cutoff threshold of the binary classifier—correct and incorrect answers should be 
balanced and not vary significantly from one zone to the other. Again, here we only 
study students’ performance; therefore “correctness” refers to the student’s answers 
(i.e., whether a student answered a question correctly) and not whether the model 
predicted correctly (i.e., whether the model predicted that the student would answer 
the way she answered). For the five grey areas defined in 4.2, we have counted the 
number of correct and the number of incorrect student answers. 

In Fig. 5 we present the distribution of correct and incorrect answers over the dif- 
ferent grey areas and over correct and incorrect model predictions (as shown in the 
cumulative analysis in Fig. 4). For example, for Area | (the model predicted 54.1% of 
the cases correctly (e.g., the model predicted that a student would answer correctly 
and indeed the student answered correctly, or the model predicted that a student 
would answer incorrectly and indeed the student answered incorrectly). Out of these 
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cases, 28.7% were correct answers to the question involved and 25.4% were incorrect. 
Likewise, for Area 1 the model predicted 45.9% cases incorrectly (e.g., the model 
predicted that a student would answer correctly but the student answered incorrectly, 
or the model predicted that a student would answer incorrectly but the student an- 
swered correctly). Out of these cases, 18% were correct answers to the question in- 
volved and 27.9% were incorrect. It is evident that even though the accuracy of the 
prediction changes between areas of different sizes, the distributions of correct and 
incorrect answers are similar. Another thing that can be noted is that for cases that the 
model predicts correctly, the ratio of correct/incorrect answers is around 1.2 (correct 
answers are slightly more than incorrect). On the contrary, for cases that are not pre- 
dicted correctly by the model the ratio of correct/incorrect answers are about 0.7 sig- 
nifying that incorrect answers outnumber correct ones. Nonetheless this is a pattern 
that is maintained for all of the grey areas and most probably it reveals that the stu- 
dent model tends to provide positive predictions. 
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Fig. 5. Graphical representation of the distribution of correct and incorrect answers (percent- 
age) with respect to the model’s correct and incorrect predictions where the sizes of the various 
areas are: Area | (0.45 < p <0.55), Area 2 (0.4 < p <0.6), Area 3 (0.35 < p <0.65), Area 4 (0.3 < 
p <0.7) and Area 5 (0.25 < p <0.75). 


3 DISCUSSION 


5.1 Contribution of the approach 


We envision that the contribution of the proposed approach, besides its novelty (to 
the best of our knowledge there is no computational operationalization of the ZPD) 
will be in defining and perhaps revising instructional methods to be implemented by 
ITSs. As noted previously, the most popular instructional method used to choose 
learning content (problems, activities, examples, etc.) is mastery learning. This means 
that the student goes through the same concept again and again until the probability of 
having mastered it is near certainty. Although mastery learning is highly effective— 
and might largely account for the effectiveness of human tutoring [1], it could lead to 
tedious repetition or frustration and eventually discourage the student from achieving 
the goal. Choosing the “next step” is a more prominent issue in the case of dialogue- 
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based intelligent tutors. Not only should the task be appropriate with respect to the 
background knowledge of the student, but it should also be presented in an appropri- 
ate manner so that the student will not be overwhelmed and discouraged — if the task 
is hard for the student — or boring and not challenging — if the task is too easy. Anoth- 
er key distinction with Mastery Learning perhaps worth mentioning is the idea that 
ZPD focuses on the level of help. Mastery Learning implies that help might be need- 
ed to move the student forward, but it doesn’t explicitly include it as part of the defi- 
nition. 

To address this issue, we need an assessment of the knowledge state of each stu- 
dent and insight into the appropriate level of support the student needs to achieve the 
learning goals. This is described by the notion of ZPD. We claim that our approach 
makes an explicit link between student modeling and the ZPD and that this approach 
is a reasonable and novel operationalization of the ZPD. It is evident that if we can 
model the ZPD then we can adapt our instructional strategy accordingly. For example, 
if a student is above the ZPD—that is, able to solve a problem on her own and with- 
out any help—the tutor will probably challenge the student with some questions that 
go beyond the current problem’s level of difficulty. On the other hand, if a student is 
in the ZPD—that is, the student needs help and appropriate scaffolding to solve the 
current problem—the tutor will go slowly, perhaps clarifying step by step the 
knowledge the student seems to be lacking. Finally, if a student is below the ZPD— 
that is, the student completely lacks the necessary skills and will not be able to solve 
the problem, either with or without help—the tutor might choose to skip this problem 
or to select more appropriate (perhaps simpler) problems. Depending on the state of 
the student’s knowledge, the tutorial dialogue may be directed and focus on particular 
curriculum elements (facts, concepts, skills, etc.) to discuss during a given problem 
and to determine the appropriate level at which to discuss these elements. 


5.2. Validation of the proposed approach 


In this paper, we provide preliminary support for our approach. It is also necessary to 
validate our approach. The challenge in doing so lies in the fact that there is no objec- 
tive way to test that a student is (or is not) in the ZPD. One heuristic that could be 
used to explore this is to provide different levels of support to students using the pro- 
posed approach and then observe the outcome. Students who are expected to be in the 
ZPD and who receive appropriate scaffolding should be able to correctly answer the 
questions asked by the tutor. Thus, we plan to carry out larger scale studies where the 
dialogue will adapt to the student’s knowledge according to the guidance provided by 
the student model and the represented Grey Area. The dialogue adaptation will take 
place on selected dialogue steps (in order to maintain the coherency of the dialogue) 
and will be implemented following three basic adaptation rules: 


e Students who are above the Grey Area will receive more challenging questions, no 
help or even skip specific parts of the dialogue that the model predicts they have 
mastered; 

e Students who are within the Grey Area will receive meaningful information, scaf- 
folding and hints related to the step in question; 
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e Students who are below the Grey Area will either skip the step that the model pre- 
dicts they are unable to answer or they will receive explicit information and in- 
struction. 


To evaluate our approach, we will study the learning gains of students who receive 
different levels of support (hints, worked out examples, explicit information, etc.) 
based on their performance in pre- and post- knowledge tests and their performance 
during activities within Rimac. We are optimistic that the dialogue adaptation accord- 
ing to the Grey Area concept will improve students’ learning gains and motivation. 


6 CONCLUSION 


In this paper, we present a computational approach to model the Zone of Proximal 
Development in ITSs. To that end, we introduce the concept of the “Grey Area”, that 
is the area of uncertainty in which the student model cannot predict with acceptable 
accuracy whether a student is able to give a correct answer without support. It is im- 
portant to point out that we do not claim that the Grey Area is the ZPD. Instead, our 
proposal is that if the model cannot predict the state of a student’s knowledge, it may 
be that the student is in the ZPD. 

To justify our reasoning, we used data collected from classroom studies where stu- 
dents reflected on the concepts associated with physics problems, using a dialogue- 
based tutoring system (Rimac). We explored the operationalization of our approach 
by studying the behavior of the student model and the performance of students within 
grey areas of various sizes. We found that the accuracy of the model changes depend- 
ing on the size of the grey zone but the distribution of correct and incorrect student 
responses remains fairly constant. Additionally, we showed that the average predicted 
probabilities per student—that is, the average value of the fitted probabilities for a 
particular student during her interaction with the Dialogue Tutor—correlates positive- 
ly on a statistically significant level with the student’s scores on pre- and post- 
knowledge tests. This suggests that the student model predictions can provide reliable 
indicators of students’ performance. A limitation of our work is that we have not yet 
conducted a larger-scale and rigorous evaluation of the approach; however, plans to 
validate the model are being developed. Our plan is to carry out extensive studies to 
explore the proposed approach to modeling the ZPD, as well as to better understand 
the strengths and limitations of using a student model to guide students through adap- 
tive lines of reasoning. 
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